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THEORETICAL ANALYSIS OF DYNAMIC CLUSTERING 

By Xiaogang Wang 

A theoretical framework is derived to investigate the convergence 
and stabihty of dynamic clustering methods which transform data 
according to different laws of attraction to achieve autonomous par- 
titions. On applying the conservation law, we establish partial dif- 
ferential equations to prescribe the successive transformations of the 
underlying probability densities in dynamic clustering. These par- 
tial differential equations correspond to anti-diffusion processes and 
are solved analytically. We show that a broad class of unsupervised 
shrinking or clustering methods including the mean-shift algorithm 
are intrinsically unstable except for independent normal densities. 
Theoretical results of the supervised dynamic clustering processes 
indicate that an effective supervision must be chosen judiciously to 
ensure a correct convergence since a universally optimal supervising 
function does not exist. 

1. Introduction. Clustering is the process of partitioning a set of ob- 
jects into subgroups according to certain measure of similarity. Cluster anal- 
ysis has many applications in data mining in which large data sets, such as 
biological data or climate data, need to be partitioned into much smaller and 
homogeneous groups. Hastie et al. (2001) and Han and Kamber (2006) both 
offer excellent reviews on clustering algorithms with different emphasis. 

For many classical clustering algorithms, such as K-means (MacQueen 
1967; Hartigan and Wong 1979) and PAM (Kaufman and Rousseeuw 1990), 
the number of clusters or sub-populations needs to be specified by the user. 
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One approach is to select the number of clusters by optimizing a certain 
measure of strength of the clusters (Tibshirani, Walther and Hastie 2000 
and Fraley and Raftery 2002). An alternative is to first partitions the data 
into many small clusters, and then merges these small clusters until no 
clusters can be merged (Prigui and Krishnapuram 1999). The third strategy 
is to extract one cluster at a time (Zhung et al. 1996). The determination 
of the number of clusters, however, remains to be a challenging problem 
when the clusters assume non-normal shapes with blurring or even slightly 
overlapping boundaries. It can also be very difficult to specify the exact 
functional form of the underlying probability density function due to the 
uncertainty involved in the clustering processes. 

Many automatic or non-parametric clustering algorithms have emerged 
from various disciplines in the last twenty years with a wide range of ap- 
plications to pattern recognition and image analysis. Although there are 
some minor technical or operational differences, they all treat data points 
as autonomous agents or particles and iteratively move them toward cluster 
centers or focal points. One approach, the gravitational clustering (Wright 
1977; Kundu 1999; Sato 2000; Wang and Rau 2001), considers each data 
point as a particle of unit mass with zero velocity which is gradually moving 
toward the cluster center by imposing the gravitational law. The theoretical 
properties of gravitational clustering, however, have not been fully estab- 
lished although the original idea has been proposed for thirty years. 

At the mean time, non-parametric methods have received increasing at- 
tentions in the literature due to its flexible and adaptive nature. The most 
famous and representative method is the so-called mean-shift algorithm pro- 
posed by Fukunaga and Hosteller (1975), Cheng (1995), Comaniciu and 
Meer,(2002). Given a kernel function K and a weight function w, the gen- 



1 INTRODUCTION 



3 



eralized mean-shift operation is given by 



(1.1) 



T{x) 



J2 K{x, s) w{s) s 
J2K{x,s) w{s) 



It originates from the ideas in fohowing the gradient in kernel density es- 
timation since data points are transformed toward denser regions by using 
a functional of the kernel functions. There are many variations of this al- 
gorithm. For example, Virmajoki (2002) Shi et al. (2005) and Wang et al. 
(2007a) propose algorithms that closely resemble the main idea of the mean- 
shift method. Although data sharpening procedure proposed by Choi and 
Hall (1999) is originally designed to reduce bias by pushing data points at 
the boundary a bit closer to the center, the movements also resemble the 
one in the mean-shift method. Woolfold and Braun (2006) apply the data 
sharpening method for the identification and tracking of spatial temporal 
centers of lightning activity. 

Although non-parametric clustering methods are appealing to practitioner 
and have been applied in many research areas, the underlying process is 
not well understood since the initial probability density function undergoes 
continuous nonlinear transformations. Chen (1995) pointed out that it is 
difficult to see where the mean-shift method leads to since all data points are 
moving simultaneously. The same statement is also true for almost all non- 
parametric dynamic clustering methods. Although local properties of these 
algorithms are intuitively clear, the intrinsic global patterns have proven 
to be difficult to establish. The major difficulty to gain insights about the 
validity of these methods is due to complex spatial and temporal patterns 
embedded within these dynamic clustering methods. 

In this article, we develop an analytical framework for dynamic clustering 
which prescribes the time space evolution of the initial probability density 
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function. The mathematical properties derived from the framework provide 
guidance and determinism for complex dynamic evolutions with clear cri- 
teria to evaluate convergence and reliability for a general class of dynamic 
clustering methods. The proposed framework is derived by modeling the 
successive nonlinear transformations of the underlying probability density 
function based the general conservation law which describes the conservation 
of some basic physical quantity of a closed system. The dynamic clustering 
process is then characterized by studying the time-space evolution of the 
initial probability density function. The differential form of the conservation 
law derived from the dynamic clustering scheme is a class of second order 
partial differential equations (PDE) that are anti-diffusive in nature. On ap- 
plying the proposed framework, we find that a broad class of unsupervised 
dynamic clustering methods only converges for normal densities with inde- 
pendent structures. The entropy of the corresponding clustering process has 
been proven to be non-increasing through time. This is a direct violation of 
the second law in thermodynamics. Consequently, these clustering processes 
are unnatural and could exhibit chaotic behaviors locally at any given time. 
Therefore, supervised clustering should be preferred in order to ensure a 
general and correct convergence. Theoretical results for supervised dynamic 
clustering indicate that a universally effective supervising function does not 
exist. Therefore, each supervising function should be chosen judiciously for 
each specific application. 

This article is organized as follows. We present the general framework for 
dynamic clustering in Section 2. Section 3 presents theoretical analysis of 
unsupervised dynamic clustering. The convergence for supervised dynamic 
clustering is shown in Section 4. Discussions are provided in Section 5. 
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2. Theoretical Framework for Dynamic Clustering. We establish 
a theoretical framework that prescribes recurrent regularities during the 
shrinking or clustering processes. 

2.1. Evolutions of Densities. In dynamic clustering, each data point is 
considered as a particle under a gravitational field or an autonomous agent 
governed by certain laws of attractions. Although the realization of these 
clustering methods is discrete, one can imagine an underlying continuous 
process when the total number of iterations is very large. In such a continu- 
ous process, each data point then travel continuously toward a local target 
at any given time. Therefore, the underlying probability density function un- 
dergoes a constant transformation process. Consequently, this kind of con- 
tinuous process can be modeled by describing the patterns in the evolution 
of densities. 

We now define a family of probability density function 



Therefore the space-time evolution of the dynamic clustering processes 
can then be modeled using probability densities that are also functions of 
time. Instead of modeling the individual movements of one particular data 
point, one can examine the pattern of transformations of underlying proba- 
bility distribution. We now an analytical framework based on some funda- 
mental principles of these processes. 

2.2. One-Dimensional Conservation Law . We note that the total num- 
ber of data points remain the same despite of usage of different law of at- 
tractions. Even if some merging strategy is employed, a data point should 
not lose its deserved infiuence to the partition process to avoid biased or 
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inaccurate estimate of the underlying probability density function. The con- 
servation law is perfectly applicable since the rate of change of the total 
number of particles contained in a fixed volume is equal to the influx of 
particles or data points passing through the boundary. 

To illustrate, we look at the one dimensional case. Denote the one dimen- 
sional influx of data points by q{x; t) and the probability density at time t 
by f{x; t). We then have 

(2.2) q{x;t)=u{x;t)x f{x;t), 

where u{x; t) is the speed of particles at location x and time t. 

We can imagine a constant flow of data points passing through an arbi- 
trary small interval due to the utilized laws of attractions. The data points 
are assumed to be incompressible with constant velocity in this small re- 
gion. Using the standard argument in fluid dynamics, the one-dimensional 
conservation law is then given by 

d fix; t) d q(x; t) 
(2.3 ■' \ ' ' + \ ' = 0. 

^ ' dt dx 

It characterizes the functional connection between the probability density 
function and the influx function of data points at a given location and time. 
This provides the fundamental component to establish an analytical frame- 
work to describe the space-time evolution of the probability density function. 

2.3. The General Partial Differential Equations. We now present the 
general differential form to incorporate the spatial dimensions. Denote an 
influx vector by q{x;t) and probability density function f{x;t). We then 
have 

(2.4) ±1 f^x.t)dV = - j{q.n)dS, 

V s 
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where dV is the volume element and dS is the surface element of the bound- 
ary surface S, n denotes the outward unit normal vector to S and the 
right-hand side measures the outward influx indicated by the minus sign. 

On applying the Gauss divergence theorem and taking d/dt inside of the 
integral on the left hand side, we then have 

(2.5) j (^^fel)+vq(rr,i))dy = 0. 

V 

where V is the divergence operator, given by 

dqi{x,t) 



(2.6) Vq{x,t) = 



dxi 



1=1 

where q^'s are the coordinate functions of q{x,t). 

Since the result is valid for any arbitrary volume V, the integrand must 
be zero if it is continuous. The differential form of the general conservation 
law is then given by 

(2.7) ^Z^ + Vq(^,t) = 0, 

where q{x;t) = u{x;t) x f{x;t). Detailed derivations and discussions of 
conservation laws and associated differential forms can be found in Debneth 
(2004). 

For supervised dynamic clustering, the trajectories of data points will also 
be influenced or dictated by supervision. This is equivalent of imposing a 
source or sink function in the process since data points will be "absorbed" in 
a given domain. Denote the supervision function or sink function by iIj{x, t). 
Following a similar argument, the conservation law with a sink function is 
given by 

(2.8) ^fe^+Vg(a;,t) = ^l^ix,t). 
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This framework is applicable to many dynamic clustering processes as 
long as data points are not lost in the partition process. It provides the 
much needed an analytical framework and established the foundation of 
our subsequent theoretical analysis of dynamic clustering methods. We will 
establish theoretical results for unsupervised dynamic clustering methods 
including the well known mean-shift method in the next section. 

3. Properties of Unsupervised Dynamic Clustering . 

3.1. Unsupervised Dynamic Clustering. In unsupervised dynamic clus- 
tering methods, the movements of data points depend on the functional 
connection between the current probability density function and its gradient 
or first order derivative. The movements of many non-parametric dynamic 
clustering methods are often governed by a law such that a data point will 
move to the center along more or less the direction of the gradient adjusted 
by the value of the current density function at the point of interest. 

This can be formulated mathematically as the following: 



The gradient component forces each data point to optimize its trajectory 
to seek a local mode or cluster center which is known as the mode seek- 
ing property. The movement is also proportional to the reciprocal value of 
the current probability density function. This implies that data points in 
sparsely populated areas will travel longer distances when compared with 
that of data points in densely populated area even if the gradient functions 
assume the same value at these two different locations. 

Following the argument in Cheng(1995), one can show that the mean- 
shift algorithm indeed belongs to this category. All the different variations 




u{x; t) = a 



' fix;t) 
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or improved versions based on the mean-shift method therefore are embraced 
by this category as well. As an alternative to the traditional movements in 
the mean-shift method, data points could move to the local center given by 
the conditional mean: 

^ " /(*) dt ' 

where B{x'',d) is a neighborhood with the center located at and the 
radius d. Wang et al. (2007b) showed that 

,3.3, ,^«.,^«,^W|i*),o,,3). 

Since a i^T-nearest neighbor approach provides a natural estimate for a con- 
ditional mean, it will also satisfy equation (3.1). 

3.1.1. Convergence Analysis for One-Dimensional Case. By virtue of 
eqn (2.4) and the assumption by eqn(3.1), the corresponding differential 
form is then given by 

(3.4) a^^^^'*^ = -" 

where a > and /(x,0) = (poix), the initial probability density function. 

This is a one-dimensional anti-diffusion equation. Anti-diffusion processes 
are rare in reality and they do not appear often in the literature except 
for some special types such as the crystallization process. Therefore , anti- 
diffusion equations have quite unique characteristics than those from diffu- 
sion or other equations. We now present the exact analytical solution to this 
differential equation. 



Theorem 3.1. Under assumption by equation (3.1), the one- dimensional 
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anti- diffusion equation has one unique solution and takes the following form 
(3.5) f{x-t) = -r^— f MOe'^'^dt t<0, 

— OO 

where fo{x) = (j)o{x), the initial probability density function. 
Proof: Consider the Fourier transformation of f{x;t): 

OO OO 

F{u;t) = -^ J f{x,t)e'^^dx; f{x;t) = -^ J F{u:,t)e-'^^duj. 



First, observe that 



OO OO 

— OO 

F{uj;t). 



— OO 

d_ 

Ft' 



Furthermore, note that 



OO 

fUx;t)e'^^dx = f e'-- ^{f,{x-t)) 



dx 

OO 

OO 



e'""/.(x;t)|!°^- J e''^y,{x;t)dx 



-OO 



f{x;t)e"^''dx 

= -u}'^F{uj,t). 

It then fohows from equation (3.4) that 
d 

(3.6) —F{uj;t)-a^u}^F{oj,t) = 0. 
The initial boundary condition also gives rise to 

OO 

(3.7) Moo) = F{u:;0) = ^ / Mx)e'''''dx. 

— OO 
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Consequently, the equation (3.6) has the solution 

(3.8) F(a;;t) = $o(w)e'^''"'*. 
It then fohows that 

oo 

(3.9) f{x;t) = / <fo(c^)e'^''^'*"*'^"da;. 



Note that the domain of convergence for the above integral is (— oo,0). 
We then have 



oo / oo 



— oo \— oo / 
oo / oo \ 



-oo 
oo 



— oo ^ ' ' 

oo „ 

-^== / MO e dC, t<0. o 



The fact that the solution is only available for t < implies that the 
evolutions of densities are deterministic causal events. Given current status, 
there is only one unique process that will cause or explain what has occurred. 
This is proven to be a powerful tool to establish convergence analysis for 
dynamic clustering methods as shown in the next theorem. 

Theorem 3.2. Under assumption by equation (3.1), a convergence to a 
location fiQ can only occur for normal densities with mean /xq and variance 
proportional to a^. 

In addition, the first order derivative of the variance with respect to time is 
given by 

(3.10) ^ = -2a\ 

^ ' dt 
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where erf denotes the variance of the normal density at time t. 
The converging speed of a data point at a location x at time t is 

(3.11) u{x;t) = ^^^, t<0. 

Proof: If the convergence at time t = at a location this imphes that 
/o = 5{x — ^o)- By Theorem 3.1, it then follows that 

(3.12) f^a:-t) = ^== e "3^, t < 0. 

The variance takes the form 2a^(— t). Therefore, = — 2a^. The rest of 
the result follow immediately from the assumption. o 

The assertion from this theorem effectively states that a convergence can 
only occur for a normal density for one dimensional case. The spatial varia- 
tion of normal densities at different time point depends on the magnitude of 
contraction forced by the attraction law. We show that this result also holds 
true for higher-dimensional spaces when there are multiple cluster centers. 

3.1.2. Convergence in Multi- dimensional Space. Under assumption by 
equation (3.1), it then follows that 

(3.13) q{x, t) = u{x, t)*f{x,t) = a^Vf{x,t). 
Consequently, the equation (2.7) now becomes 

(3.14) ^i^ + a'v'f{x,t) = 0, 

n 

where theLaplacianoffV'^f = ^ d'^f/dxf, and the boundary condition is 

i=l 

given by / (03,0) = fo{x). 

The phenomenon for the one dimensional case be generalized to the multi- 
dimensional case when there are multiple cluster centers. 
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Theorem 3.3. Under the assumption specified in equation (3.1), the 
anti- diffusion equation 

(3.15) ^Z^ + a2v2/(^,i) = 0, 
with the boundary condition f\t=Q = </)o has the solution 

(3.16) f{x,t) = (-4o2t)-"/2 j 4>o{v) e~^^ dt], 

iv) 

n 

where [r] — x)"^ = Yl, {ili ~ Xif. 

i=l 

Proof: The dynamic shrinking or chistering can be characterized as 

(3.17) ^li^ + a'v'f{x,t) = 0, 

with boundary condition f(x,0) = (f)Q{x), a probabihty density function. 
Denote the n-dimensional Fourier transformation as the following: 

oo oo oo 

(3.18) Fn{s;t)= (^-^y J j ... j f{x,t)e''-^ dxidx2...dxn, 

—oo —CO —oo 

n 

where s ■ x = J2 ^i- 

1=1 

Apply Fourier transformation on both sides of equation (3.17), we then 
have 

(3.19) dFnis,t)/dt + \\s\\Fn{s,t) = 

n 

where ||s|| = J2 ^1- 
1=1 

Thus, the solution for above equation is 

(3.20) Fn{s,t) = Fnis,0) e-"'ll^ll*. 
where 

(3.21) Fn{s,0)= {-h^y J Mx)e'' ''dx. 
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Using inverse Fourier transformation, we then have 

(3.22) f{x,t) = I F„(5,0)e-<^'"^ll*-'--^d5. 



2tt 

is) 

It then fohows that 

(3.23) f{x,t) = (^)"/ / UV) e'"^ dr/| g""' H^H 

is) \v) I 

where r) = (r?i,r/2, • • • ,?]«)■ 

By rearranging the order of integration and simplifying the same way in 
Theorem 3.1, we then have 

(3.24) f{x,t) = (-4o2i7r)-"/2 j (f^^^rj) e~^^ drj, 

n 

where {rj — x)"^ = J^iVi ~ ^i)"^- ^ 

i=l 

The analytic solution allows us to retract the stages of the convergence 
and recover the deterministic pattern of evolution of the initial probability 
density function. We show that the family of probability density functions 
that guarantee a convergence is a multivariate normal distribution with in- 
dependent correlation structure. 

Theorem 3.4. Under the assumption by equation (3.1), a dynamic shrink- 
ing or clustering converges to m distinct cluster centers if and only if the 
initial density function is a mixture of normal distribution with equal vari- 
ances, i.e. 

m 

(3.25) /(a3,t) = ^Ai t<0. 

1=1 

where (pi is the normal density function with mean jx^ and variance -2aH. 
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Proof: 

If the dynamic process converges to a finite number of focal points, i.e., 

m m 

(3.26) (t)o{r]) = ^Xj 6(7] - fXj), Xj > and ^Aj = l, 

n 

where fij = [fiji, /Uj2, • • • , Mjn), and 6{r] - fij) = J] ^iVj - fJ'ji)- 

i=l 

It then follows that 

(3.27) = EA,(^) e 

where cr^ = —2a?t, t < 0. o 

This theorem also implies that contraction rates of dynamic shrinking or 
clustering are homogenous in all directions for a correct convergence. An 
heterogeneous contraction pattern therefore implies a non-convergence. 
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3.2. Instability of Unsupervised Dynamic Shrinking. Although this gen- 
eral class of dynamic clustering method seem to be intuitively appealing 
and a practical convergence might be observed, our theoretical analysis has 
revealed that they actually will not converge correctly except for normal 
densities with independent structure. Therefore an observed convergence 
using unsupervised dynamic clustering for non-normal densities or normal 
densities with non-trivial correlation structures is then either an illusion or 
an artificial and incorrect one. For clusters that are well separated in low 
dimensions, a correct convergence could occur if the smoothing parameter is 
set to be suitably large to capture local patterns accurately. However, there 
is little hope to expect this kind of success for high-dimensional data when 
the manifolds are complex and not well separated. 

In order to understand a little more precisely a possible instability, we 
examine the variations of the system through time using the quantity called 
entropy. The entropy has been widely used in information theory and coding 
theory. The entropy for a given density function / is defined by 



The entropy is a measure of uncertainty. For example, the entropy for a 



We shall not be concerned with the information-theoretic aspects of the 
entropy in this article. Instead we will use its original utility to describe the 
variability of a dynamic process. We now prove the following theorem which 
prescribes the change of entropy. 

Theorem 3.5. Assume that the conservation law assumption described 
in equation (3.1) hold. If 



(3.28) 




normal distribution with mean /i and variance cr^ is ilog(27ro-2 + 1). 



(3.29) 
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then 



(3.30) 



dH{ft) 



< 0. 



dt 



Proof: Consider the first order derivative of H{ft) with respect to t. We 
then have 



The last step follows from the integration by part and the assumption of the 
theorem, o. 

This shows that the entropy is non-increasing at all times and therefore 
dynamic clustering method incur a complete violation of the second law of 
thermodynamics. So the dynamic shrinking and clustering do not correspond 
to a natural process and are unstable in nature except for normal densities. 
Without the intervention of supervision, these artificial laws defined locally 
could produce chaotic and unpredictable movements. 

4. Convergence with Supervision. We have established that the 
convergence of unsupervised shrinking or clustering can only be achieved 
for independent normal random variables with equal variances. A natural 
question is whether this instability or non-convergence can be overcome by 
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some kind of supervision. Mathematically, this might be equivalent to im- 
posing a so called sink function into the PDE in the following form: 

(4.1) ^Ii^ + a'v'fix,t) = V^(x,t), 

where -0 is a continuous function. 

Theorem 4.1. Under the assumption by equation (3.1), the PDE asso- 
ciated with supervised clustering has the following solution 

(4.2) 

f{x,t) = (-4a2t^)-"/2 J dioir)) e~^^^ drj 

(ri) 

+ 11 -0(1,^) [-4a2(t-r)7r]-"/2 e"^£^^d^(ir, t < 0. 
Jt J{^) 

If the clustering process converges to m distinct focal points, then 



(4.3) 



rO 



1 



-n/2 

' ?i 



2cr: 



+ f f V'(^,t) [-4a2(t-r)^]-"/2 e -^-'(^-r) d^ dr, t < 0. 
Jt J(i) 

Proof: 

The general solution with the sink function in the PDE can be decomposed 
into two parts: 

(4.4) f{x,t)=gi{x,t)+g2{x,t), 



where gi is the solution for the PDE in eqn(3.15) with boundary condition 
gi{t = 0) = 00 and g2 satisfying the PDE specified by eqn(4.1) with the 
boundary condition g^^Q = 0. 
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(ii) The function form of gi is given by Theorem 3. That is, 

gi{x,t) = {-AaH-Ky/^ J Mv)e~^^^dr], 

(ri) 

n 

where {r] - xf = {m - Xif' . 

i=l 

(iii) To find g2 , we consider a nonhomogeneous differential equation of the 
form 

(4.5) Lx u{x) = il;{x,t), 

where Lx is a hnear partial differential operator associated with the initial 



PDE, that is 



du{x,t) 2 V72. 



(4.6) Lx u{x) = ' +a' V'u{x,t). 
The Green function G{x,$,) of this problem satisfies the equation 

(4.7) LxG{x,^) = 6{x-^)6{t-T) 
where Gt=o = 0. 

The solution for the partial differential equation by eqn(4.5) is then given 

by 

(4.8) u{x) = r f ^i^,T)Gix,t;tr)d^dT, 
where the Green function satisfying the following PDE 

^^^+a' V'G{x,t) = 0, 

G\t=r = S{x-0- 

By theorem 3 and replacing t hy t — t, the Green function is then given 

by 

G{x,t) = [-4a2(t - r)^]-"/2 J S{r] - e~^^^^) drj 

(4.9) iv) 

n/2 



[-4a"(t-r)7r]-'^/" e 



-4a^(t-T) 
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where {x — rff' = (^j ~ Vi)^ ■ 



i=l 



It then follows that 



(4.10) 



Jt J{0 



Therefore, 



(4.11) 




i'^Ht-r) (1$, dT, t < 0, 



where M is the normalizing constant to ensure that f{x,t) is a proper 
probability density function, o 

The fact that the original density function of the PDE is a functional of 
the supervision function implies that a correct convergence is dependent on 
the choice of the supervising function. The assertion of the theorem indicates 
that a universally effective supervising function does not exist. A supervising 
function then must be chosen judiciously to ensure a correct convergence. 

We remark that there are some dynamic shrinking and clustering algo- 
rithms that could be stable due to external sink functions. This is due to the 
violation of the conservation law. One possible scenario is the crystallization 
processes as described in Teran and Bill (2010). It is stable due to the fact 
that particles are accumulating and transformed into solid with zero speed 
due to the crystallization. 
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5. Discussion. Convergence and stability analysis of dynamic cluster- 
ing methods are almost non-existent in the literature. By using the con- 
servation law, we establish partial differential equations that prescribe the 
spatial and temporal variations and evolutions of these clustering processes. 
We show that, in the absence of a sink function or intervention, many dy- 
namic clustering methods including the well known mean-shift algorithm do 
not result in a correct convergence in general unless all variables involved 
are independent and normally distributed. The non-increasing property of 
the entropy and the anti-diffusive nature of many dynamic clustering meth- 
ods prevent them to be universally reliable without a proper supervision. 
A supervised dynamic clustering should be preferred and the supervising 
function must be chosen carefully to ensure valid results. 

We emphasize that an artificial convergence can be generated by setting 
certain parameters such as the radius or smoothing parameter to certain 
values. For example, a global convergence to one focal point can be achieved 
if the parameters are set such that all data points will be forced to move to 
one cluster. A practical convergence can also be produced by many suitable 
values. Although such a convergence can be achieved, this is usually not 
meaningful since significant local geometrical properties will not be respected 
in those circumstances. 

A semi-dynamic or static clustering method could produce a correct con- 
vergence if only one data point is allowed to move while the entire probability 
density function remains unchanged. The entire process is only dynamic for 
a chosen data point. This class of method is often not preferred due to its 
slow convergence. 

Work is in progress to extend the framework proposed in this article to 
provide theoretical analysis for other clustering methods. 
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